Goto

Collaborating Authors

 Norfolk


A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows

Bandara, Eranga, Gore, Ross, Foytik, Peter, Shetty, Sachin, Mukkamala, Ravi, Rahman, Abdul, Liang, Xueping, Bouk, Safdar H., Hass, Amin, Rajapakse, Sachini, Keong, Ng Wee, De Zoysa, Kasun, Withanage, Aruna, Loganathan, Nilaan

arXiv.org Artificial Intelligence

Agentic AI marks a major shift in how autonomous systems reason, plan, and execute multi-step tasks. Unlike traditional single model prompting, agentic workflows integrate multiple specialized agents with different Large Language Models(LLMs), tool-augmented capabilities, orchestration logic, and external system interactions to form dynamic pipelines capable of autonomous decision-making and action. As adoption accelerates across industry and research, organizations face a central challenge: how to design, engineer, and operate production-grade agentic AI workflows that are reliable, observable, maintainable, and aligned with safety and governance requirements. This paper provides a practical, end-to-end guide for design-Email addresses: cmedawer@odu.edu We introduce a structured engineering lifecycle encompassing workflow decomposition, multi-agent design patterns, Model Context Protocol(MCP), and tool integration, deterministic orchestration, Responsible-AI considerations, and environment-aware deployment strategies. We then present nine core best practices for engineering production-grade agentic AI workflows, including tool-first design over MCP, pure-function invocation, single-tool and single-responsibility agents, externalized prompt management, Responsible-AI-aligned model-consortium design, clean separation between workflow logic and MCP servers, containerized deployment for scalable operations, and adherence to the Keep it Simple, Stupid (KISS) principle to maintain simplicity and robustness. To demonstrate these principles in practice, we present a comprehensive case study: a multimodal news-analysis and media-generation workflow. By combining architectural guidance, operational patterns, and practical implementation insights, this paper offers a foundational reference to build robust, extensible, and production-ready agentic AI workflows. Introduction The rapid advancement of Large Language Models (LLMs) [1, 2], Vision-Language Models (VLMs) [3, 4, 5], and tool-augmented reasoning has laid the foundation for a new paradigm in automation: agentic AI [6, 7]. Traditional LLM interactions follow a simple pattern in which a human provides a prompt and the model generates a response (as illustrated in the top half of Figure 1).


Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4

Makohon, Ivan, Najafi, Mohamad, Wu, Jian, Brochhausen, Mathias, Li, Yaohang

arXiv.org Artificial Intelligence

In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients' assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor's valuable time, increasing the patient's waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM's response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.


ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

Bandara, Eranga, Hass, Amin, Gore, Ross, Shetty, Sachin, Mukkamala, Ravi, Bouk, Safdar H., Liang, Xueping, Keong, Ng Wee, De Zoysa, Kasun, Withanage, Aruna, Loganathan, Nilaan

arXiv.org Artificial Intelligence

AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs). However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional threat modeling frameworks. In this paper, we introduce ASTRIDE, an automated threat modeling platform purpose-built for AI agent-based systems. ASTRIDE extends the classical STRIDE framework by introducing a new threat category, A for AI Agent-Specific Attacks, which encompasses emerging vulnerabilities such as prompt injection, unsafe tool invocation, and reasoning subversion, unique to agent-based applications. To automate threat modeling, ASTRIDE combines a consortium of fine-tuned vision-language models (VLMs) with the OpenAI-gpt-oss reasoning LLM to perform end-to-end analysis directly from visual agent architecture diagrams, such as data flow diagrams(DFDs). LLM agents orchestrate the end-to-end threat modeling automation process by coordinating interactions between the VLM consortium and the reasoning LLM. Our evaluations demonstrate that ASTRIDE provides accurate, scalable, and explainable threat modeling for next-generation intelligent systems. To the best of our knowledge, ASTRIDE is the first framework to both extend STRIDE with AI-specific threats and integrate fine-tuned VLMs with a reasoning LLM to fully automate diagram-driven threat modeling in AI agent-based applications.


A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations

Black, Scotty, Darken, Christian

arXiv.org Artificial Intelligence

In the domain of combat simulations in support of wargaming, the development of intelligent agents has predominantly been characterized by rule-based, scripted methodologies with deep reinforcement learning (RL) approaches only recently being introduced. While scripted agents offer predictability and consistency in controlled environments, they fall short in dynamic, complex scenarios due to their inherent inflexibility. Conversely, RL agents excel in adaptability and learning, offering potential improvements in handling unforeseen situations, but suffer from significant challenges such as black-box decision-making processes and scalability issues in larger simulation environments. This paper introduces a novel hierarchical hybrid artificial intelligence (AI) approach that synergizes the reliability and predictability of scripted agents with the dynamic, adaptive learning capabilities of RL. By structuring the AI system hierarchically, the proposed approach aims to utilize scripted agents for routine, tactical-level decisions and RL agents for higher-level, strategic decision-making, thus addressing the limitations of each method while leveraging their individual strengths. This integration is shown to significantly improve overall performance, providing a robust, adaptable, and effective solution for developing and training intelligent agents in complex simulation environments.



T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search

Cui, Xing, Zou, Yueying, Li, Zekun, Li, Peipei, Xu, Xinyuan, Liu, Xuannan, Huang, Huaibo

arXiv.org Artificial Intelligence

Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose \method, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, \method~ extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that \method~ consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free detector.


Mysterious drones have been spotted at night at airports across Europe. How worried should we be?

BBC News

Mysterious drones have been spotted at night at airports across Europe. How worried should we be? First comes the warning, that disembodied voice over the tannoy: Your attention please. Please move to the shelter on the minus second floor. Then comes the mosquito-like whine of the incoming Russian drones, massing in their hundreds just above the clouds.


Towards High Resolution Probabilistic Coastal Inundation Forecasting from Sparse Observations

Islam, Kazi Ashik, Mehrab, Zakaria, Halappanavar, Mahantesh, Mortveit, Henning, Katragadda, Sridhar, Loftis, Jon Derek, Hoops, Stefan, Marathe, Madhav

arXiv.org Artificial Intelligence

Coastal flooding poses increasing threats to communities worldwide, necessitating accurate and hyper-local inundation forecasting for effective emergency response. However, real-world deployment of forecasting systems is often constrained by sparse sensor networks, where only a limited subset of locations may have sensors due to budget constraints. To approach this challenge, we present DIFF -SPARSE, a masked conditional diffusion model designed for probabilistic coastal inundation forecasting from sparse sensor observations. DIFF -SPARSE primarily utilizes the inundation history of a location and its neighboring locations from a context time window as spatiotemporal context. The fundamental challenge of spatiotemporal prediction based on sparse observations in the context window is addressed by introducing a novel masking strategy during training. Digital elevation data and temporal co-variates are utilized as additional spatial and temporal contexts, respectively. A convolutional neural network and a conditional UNet architecture with cross-attention mechanism are employed to capture the spatiotemporal dynamics in the data. We trained and tested DIFF -SPARSE on coastal inundation data from the Eastern Shore of Virginia and systematically assessed the performance of DIFF -SPARSE across different sparsity levels 0%, 50%, 95% missing observations. Our experiment results show that DIFF -SPARSE achieves upto 62% improvement in terms of two forecasting performance metrics compared to existing methods, at 95% sparsity level. Moreover, our ablation studies reveal that digital elevation data becomes more useful at high sparsity levels compared to temporal co-variates.


Comparing EPGP Surrogates and Finite Elements Under Degree-of-Freedom Parity

Amo, Obed, Ghosh, Samit, Lange-Hegermann, Markus, Raiţă, Bogdan, Pokojovy, Michael

arXiv.org Machine Learning

We present a new benchmarking study comparing a boundary-constrained Ehrenpreis--Palamodov Gaussian Process (B-EPGP) surrogate with a classical finite element method combined with Crank--Nicolson time stepping (CN-FEM) for solving the two-dimensional wave equation with homogeneous Dirichlet boundary conditions. The B-EPGP construction leverages exponential-polynomial bases derived from the characteristic variety to enforce the PDE and boundary conditions exactly and employs penalized least squares to estimate the coefficients. To ensure fairness across paradigms, we introduce a degrees-of-freedom (DoF) matching protocol. Under matched DoF, B-EPGP consistently attains lower space-time $L^2$-error and maximum-in-time $L^{2}$-error in space than CN-FEM, improving accuracy by roughly two orders of magnitude.


Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System

Bandara, Eranga, Gore, Ross, Yarlagadda, Atmaram, Clayton, Anita H., Samuel, Preston, Rhea, Christopher K., Shetty, Sachin

arXiv.org Artificial Intelligence

The diagnosis of most mental disorders, including psychiatric evaluations, primarily depends on dialogues between psychiatrists and patients. This subjective process can lead to variability in diagnoses across clinicians and patients, resulting in inconsistencies and challenges in achieving reliable outcomes. To address these issues and standardize psychiatric diagnoses, we propose a Fine-Tuned Large Language Model (LLM) Consortium and OpenAI-gpt-oss Reasoning LLM-enabled Decision Support System for the clinical diagnosis of mental disorders. Our approach leverages fine-tuned LLMs trained on conversational datasets involving psychiatrist-patient interactions focused on mental health conditions (e.g., depression). The diagnostic predictions from individual models are aggregated through a consensus-based decision-making process, refined by the OpenAI-gpt-oss reasoning LLM. We propose a novel method for deploying LLM agents that orchestrate communication between the LLM consortium and the reasoning LLM, ensuring transparency, reliability, and responsible AI across the entire diagnostic workflow. Experimental results demonstrate the transformative potential of combining fine-tuned LLMs with a reasoning model to create a robust and highly accurate diagnostic system for mental health assessment. A prototype of the proposed platform, integrating three fine-tuned LLMs with the OpenAI-gpt-oss reasoning LLM, was developed in collaboration with the U.S. Army Medical Research Team in Norfolk, Virginia, USA. To the best of our knowledge, this work represents the first application of a fine-tuned LLM consortium integrated with a reasoning LLM for clinical mental health diagnosis paving the way for next-generation AI-powered eHealth systems aimed at standardizing psychiatric diagnoses.